Demand Forecasting
logo



Introduction

Demand forecasting in retail is defined as the act of predicting how much of a specific product or service, customers will want to purchase during a defined time period. This method of predictive analytics helps retailers understand how much stock to have on hand at a given time. Many traditional and advanced forecasting tools are available, but applying them to a large number of products or customers is not manageable. Thus, historical product demand of ABC retail custommer is used to cluster products with similar demand trends. Once product segments are identified, a representative product is choosen to illustrate the products within the segments. Thus a manageable number of forecasting models can be built to forecast the monthly demand of the customers.

Approach

The following methodolgies represent an initial procedure to forecast customer demand. It follows basic concepts and models well described in the literature and already employed for this task. Its aim is to describe a baseline upon which more informations can be added, and more complex models can be developped.

Data mining

The data found in the ‘physical_network.csv’ file represent 216725 observations for 40 variables, and span over 3 years. Before forecasting, different features were selected and data were partially cleaned for analysis as follow.

  • A feature (variable) is selected if only it is described and not redundant with the information contained in another feature.

  • For the purpose of data mining and analysis, some new features are created: + DOW: Day of the week (numeric)

  • Month: month of the year (numeric)

  • Year: year (numeric)

  • the minimum number of features is then selected for forecasting. The aim is to give a minimal complexity to the model and maximise computing time.

  • Outliers are present in the data. However, without discussion with manufacturers or consumers, it is difficult to delete these outliers without facing the possibility of deleting important informations. Thus the choice was made not to delete them. However a script is present in the document to filter them out for future development.

  • Finally, some noise were present at the start and the end of the timeseries. Not knowing if those months were fully covered, the decision is made to filter them out of the analysis, thus only retaining fully covered months.

Forecasting

Data are decomposed and checked for seasonality and trend patterns before applying forecasting. Depending of the characteristic of the data, different forecasting methods are applied and compared.

  • ARIMA (Autoregressive Integrated Moving Average)
  • ETS (Exponential Smoothing)
  • drift, and naïve

The Akaike Information Critera (AIC) is used to test the goodness of fit and the simplicity/parcimony of the models and accuracy of predictions are calculated (RMSE, MAE, MAPE, MASE). The best model is compared to the original data after splitting them in a train and test set. The test set was set to 4 months, given the high variability of the data. Only the model selected for each group is presented in this document, although the other models are present in the code script.

Results

Data mining

To avoid biais and encourage reproducibility, the retained data belong to a selection of described features, those not described in the document are pulled out of the analysis. For most features, the number of missing values is negligeable and are filtered out. Following the cleaning process, the demand (Quantity) of each product is summarised by month.

Physical Network Data selected
Request_Date SKU Brands Sold_To Ship_To Volume_MT Quantity DOW Month Year
2017-01-09 113624 Cream 2006780 70001646 2.732398 225 2 1 2017
2017-01-09 113633 Cream 2006780 70001646 9.693187 785 2 1 2017
2017-01-09 113626 Cream 2006780 70001646 1.821604 150 2 1 2017
2017-01-09 115381 Cream 2006780 70001646 3.643202 300 2 1 2017
2017-01-27 114494 Others 2006780 70001646 1.747200 168 6 1 2017

Exploratory data analysis

Sale clustering

The number of clusters is set to 4 by the three methods of clustering, giving clusters of 23, 156, 6 and 24 products in each group, respectively. The cluster plot displays 4 clusters separated, and the between_SS / total_SS ~ 60 %, suggesting the model is an average fit for the data. Specifically, 169 products are classified into group 2 and represent products only sold a small amount of time. It must be noted that 3 clusters are selected if the outliers are taken out of the analysis. It is as well, a good candidate on the plot but was not validated by the three algorithms (Elbow, Silhouette and verified by Gap statistic).

Cluster plots

optimal cluster number

Product segmentation

Products are clustered into four separate groups: 74.6% of products belong to Group 1, whose monthly transaction rate is the lowest, while Group 3 has the largest number of transactions, but the smallest number of products.

Forecast analysis

Each group is analyzed separately (~iterative process). When choosing models, it is common practice to separate the available data into two portions, training and test data, where the training data is used to estimate any parameters of a forecasting method and the test data is used to evaluate its accuracy. Because the test data is not used in determining the forecasts, it should provide a reliable indication of how well the model is likely to forecast on new data.

Group 1

The distribution of the average monthly transactions of Group 1 is right skewed. The box plot displays that the 3rd quartile is ~400, where the majority of products in this group have < 600 transactions per month. However, there are some products having monthly transactions > 900 displayed as outliers in the box plot. However, these products seem not to be sold over the last year. So, Group 1 can be considered as low demand.

Forecast

Most of the time, SKU: 115393 has zero transactions. The products in this group will have mostly similar patterns. Thus, it is unnecessary to proceed further with analysis or develop a forecasting model with this series. It describes a static series and the most appropriate way is to copy the current data to predict the future assuming the transactions will be the same as the previous year for a certain product in Group 1. Since these products haven’t been purchased for a long time, it should be considered reaching out to the customers in the previous year, to check whether we can get an estimation of future orders. Given the fact they haven’t been sold over ~1–1.5 years, it should be checked if they are still on the market.

Group 2

The distribution is slightly left-skewed in the Box plot, without outliers. Thus the median is selected as the target value. The Q3 quartile is almost 4000, all products in Group 2 is > 1500. It can be assumed that the products in Group 2 are in high demand. The representative product of this group is the product whose number of transactions is minimally different from the median (SKU: 10367598).

ARIMA

The best fit and accuracy, after decomposition, is achieved with ARIMA(0,1,12) over ETS. the RMSE and AIC are both smaller indicating better fit for the ARIMA model. The width of the confidence interval confirms the good fit of the model for Group 2.

Evaluate and iterate

Model performance

Accuracy

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set  -92.16192 654.6474 476.1332  5.215853 30.90045 0.5525277
## Test set     -222.29415 474.5753 441.8341 33.337634 88.38007 0.5127253
##                     ACF1 Theil's U
## Training set -0.04116408        NA
## Test set     -0.33291370  0.342128

Group 3

The distribution of monthly sales is centred for group 3, the mean will be then selected to select the representative product (SKU: 10368215). In the boxplot, the distribution is slightly right-skewed, without outliers. With a third quartile > 8000, the products within this group are in high demand.

ARIMA

Again the ARIMA(2,0,2) surpasses the ETS model and Average, Naïve & Drift Methods in terms of fit and accuracy, as demonstrated by the smaller RMSE, MAE and AIC. The width of the confidence interval is narrow, which confirms the good fit of the model for Group 3. Although the forecasts are not as accurate as for Group 2.

Evaluate and iterate

Model performance

Accuracy

##                      ME     RMSE      MAE        MPE       MAPE      MASE
## Training set   248.3984 2797.895 2411.160  -36.86844   64.17141 0.8619316
## Test set     -3363.9232 4185.216 3717.866 -765.52175 1139.01482 1.3290473
##                      ACF1 Theil's U
## Training set -0.009322887        NA
## Test set      0.156171516  6.128217

Group 4

The distribution of monthly sales is centred for group 4, but the boxplot distribution is slightly right-skewed, without outliers. With a third quartile > 8000, the products within this group are in average demand. The median is selected to estimate the representative product (SKU: 115785).

ARIMA

ARIMA(0,1,2) surpasses the ETS model once again in terms of fit and accuracy, as demonstrated by the smaller RMSE, MAE and AIC (see script). The width of the confidence interval is wider and contain the data. This indicates that the fit of the model is not as good as for the other groups.

Evaluate and iterate

Model performance

Accuracy

##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set   39.56926 1461.351 1173.737   8.933472 126.0220 0.7570218
## Test set     -244.15004 2386.532 2317.821 -32.946365 366.1945 1.4949182
##                     ACF1 Theil's U
## Training set  0.06230742        NA
## Test set     -0.19033976 0.8935306

Limitation and improvement

Based on the analysis and prediction results, the following observations can be made:

 * The results obtained from the initial forecasts are based on naive models and need to be improved

on multiple levels. They do not fully capture trends and seasonality in the data. Other forecasting techniques would help make the model more accurate using a weighted combination of seasonality, trend, and historical values to make predictions. One could try fitting time series models that allow for the inclusion of other predictors using methods such as ARMAX or dynamic regression. These more complex models allow for control of other factors in predicting the time series. Finally, using the results of multiple models will improve risk management in the decision making process.

  • The median or mean is used here to choose a representative product of each cluster. With a fit ~60 % it is accurate to think that this product doesn’t cover the full representation among the cluster. One way to solve this issue would be to use different quartile to capture a better image of the products trends among each cluster.

  • For the forecasts herein, outliers were not filtered out from the analysis. Without the input of consumers or manufacturers, it is difficult to establish baselines to filter them out without losing informations. As for the Group 1, its products do not seem to be sold anymore over the last year. We should consider checking about their availability, to remove them (or not) from the data. Some features were not explained and therefore not integrated for forecasting. Once understood, they could be included either for clustering or forecasting if regressors were to be added.

  • The clustering step is as well biased by the presence of outliers, filtering them out reduced the cluster to 3 for instance. Improving data mining and cleaning steps will help make better predictions for clustering. Here the accuracy for clustering was only ~60 %, without the outliers clustering improved by 10 %. Better clusters will make for better predictions.

  • Historical data could be used on a smaller partition to train the model. For instance, instead of training the data on all the train set, we could select only the last few months to capture the most recent trends. This could potentially lead to better prediction. The training set should as well not encompass data of the month previous to the forecast. This will allow 1 month for the manufacturer to replenish the stock. If we want to predict for October, the train set should end in August, thus forecasting October demand could be done and leave 1 month to adapt production.

Note: New products, if incorporated in the database, will be automaticly clustered and analysed following the same methodology. The only need is to update the data.

 




A work by Bastien Rochowski